PrUE: Distilling Knowledge from Sparse Teacher Networks

نویسندگان

چکیده

Although deep neural networks have enjoyed remarkable success across a wide variety of tasks, their ever-increasing size also imposes significant overhead on deployment. To compress these models, knowledge distillation was proposed to transfer from cumbersome (teacher) network into lightweight (student) network. However, guidance teacher does not always improve the generalization students, especially when gap between student and is large. Previous works argued that it due high certainty teacher, resulting in harder labels were difficult fit. soften labels, we present pruning method termed Prediction Uncertainty Enlargement (PrUE) simplify teacher. Specifically, our aims decrease teacher’s about data, thereby generating soft predictions for students. We empirically investigate effectiveness with experiments CIFAR-10/100, Tiny-ImageNet, ImageNet. Results indicate trained sparse teachers achieve better performance. Besides, allows researchers distill deeper students further. Our code made public at: https://github.com/wangshaopu/prue .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition

Speech recognition systems that combine multiple types of acoustic models have been shown to outperform single-model systems. However, such systems can be complex to implement and too resource-intensive to use in production. This paper describes how to use knowledge distillation to combine acoustic models in a way that has the best of many worlds: It improves recognition accuracy significantly,...

متن کامل

Distilling Knowledge from Deep Networks with Applications to Healthcare Domain

Exponential growth in Electronic Healthcare Records (EHR) has resulted in new opportunities and urgent needs for discovery of meaningful data-driven representations and patterns of diseases in Computational Phenotyping research. Deep Learning models have shown superior performance for robust prediction in computational phenotyping tasks, but suffer from the issue of model interpretability which...

متن کامل

Distilling Task Knowledge from How-To Communities

Knowledge graphs have become a fundamental asset for search engines. A fair amount of user queries seek information on problem-solving tasks such as building a fence or repairing a bicycle. However, knowledge graphs completely lack this kind of how-to knowledge. This paper presents a method for automatically constructing a formal knowledge base on tasks and task-solving steps, by tapping the co...

متن کامل

Distilling Model Knowledge

Top-performing machine learning systems, such as deep neural networks, large ensembles and complex probabilistic graphical models, can be expensive to store, slow to evaluate and hard to integrate into larger systems. Ideally, we would like to replace such cumbersome models with simpler models that perform equally well. In this thesis, we study knowledge distillation, the idea of extracting the...

متن کامل

Face Model Compression by Distilling Knowledge from Neurons

The recent advanced face recognition systems were built on large Deep Neural Networks (DNNs) or their ensembles, which have millions of parameters. However, the expensive computation of DNNs make their deployment difficult on mobile and embedded devices. This work addresses model compression for face recognition, where the learned knowledge of a large teacher network or its ensemble is utilized...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2023

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-26409-2_7